Goto

Collaborating Authors

 asymptotically stable






Multistability of Self-Attention Dynamics in Transformers

Altafini, Claudio

arXiv.org Artificial Intelligence

In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.


Dual Perspectives on Non-Contrastive Self-Supervised Learning

Ponce, Jean, Terver, Basile, Hebert, Martial, Arbel, Michael

arXiv.org Artificial Intelligence

The stop gradient and exponential moving average iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they do not optimize the original objective, or any other smooth function, they do avoid collapse Following Tian et al. (2021), but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average always leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, asymptotically stable . Our theoretical findings are illustrated by empirical experiments with real and synthetic data. Self-supervised learning (or SSL) is an approach to representation learning that exploits the internal consistency of training data without requiring expensive annotations. However, non-contrastive approaches to SSL (Assran et al., 2023; Bardes et al., 2022) that take as input different views of the same data samples and learn to predict one view from the other, are susceptible to representational collapse where a constant embedding is learned for all data points (LeCun, 2022). We use in this presentation the dual viewpoints of optimization and dynamical systems to study theoretically and empirically the well-known stop gradient (Chen and He, 2021) and exponential moving average (Grill et al., 2020) training procedures that are specifically designed to avoid this problem. Here C is the global minimum of E (θ,ψ) (shown as negative instead of zero for readibility) associated with a collapse of the training process; B is a nontrivial local minimum one may reach using an appropriate regularization to avoid collapse; and A is a limit point of the stop gradient (SG) training procedure associated with parameters θ and ψ at convergence. In general, it is not a minimum of E and thus does not correspond to a collapse of the training process, but it is a minimum with respect to ψ of E ( θ,ψ).




No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix

Neural Information Processing Systems

As such, several crucial questions arise: What are the game-theoretic implications of the no-regret guarantees of FTRL? Do the dynamics of FTRL converge to an equilibrium of the underlying game? A folk answer to this question is that " no-regret learning converges to equilibrium in all games "